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Abstract 

Knowledge representation is a major topic in AI, and many 
studies attempt to represent entities and relations of knowl¬ 
edge base in a continuous vector space. Among these at¬ 
tempts, translation-based methods huild entity and relation 
vectors hy minimizing the translation loss from a head en¬ 
tity to a tail one. In spite of the success of these methods, 
translation-hased methods also suffer from the oversimplified 
loss metric, and are not competitive enough to model vari¬ 
ous and complex entities/relations in knowledge bases. To 
address this issue, we propose TransA, an adaptive metric 
approach for embedding, utilizing the metric learning ideas 
to provide a more flexible embedding method. Experiments 
are conducted on the benchmark datasets and our proposed 
method makes significant and consistent improvements over 
the state-of-the-art baselines. 


Introduction 


Knowledge graphs such as Wordnet ( [Miller 1995| l and Free- 
base (Bollacker et al. 20081 play an important role in AI re¬ 
searches and applications. Recent researches such as query 
expansion prefer involving knowledge graphs (Bao et al. 
20141 while some industrial applications such as question 


answering robots are also powered by knowledge graphs 
( [Fader, Zettlemoyer, and Etzioni 2(jl4 l. However, knowl¬ 
edge graphs are symbolic and logical, where numerical ma¬ 
chine learning methods could hardly be applied. This dis¬ 
advantage is one of the most important challenges for the 
usage of knowledge graph. To provide a general paradigm 
to support computing on knowledge graph, various knowl¬ 
edge graph embedding methods have been proposed, such 
as TransE ( [Bordes et al. 20lJ[ l, TransH ( [Wang et al. 2014| l 
and TransR' ^in et al. 2015| l. 

Embedding is a novel approach to address the represen¬ 
tation and reasoning problem for knowledge graph. It trans¬ 
forms entities and relations into continuous vector spaces, 
where knowledge graph completion and knowledge classi¬ 
fication can be done. Most commonly, knowledge graph is 
composed by triples (/i, r, t) where a head entity h, a rela¬ 
tion r and a tail entity t are presented. Among all the pro¬ 
posed embedding approaches, geometry-based methods are 
an important branch, yielding the state-of-the-art predictive 
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Eigure 1; Visualization of TransE embedding vectors for 
Ereebase with PCA dimension reduction. The navy crosses 
are the matched tail entities for an actor’s award nominee, 
while the red circles are the unmatched ones. TransE ap¬ 
plies Euclidean metric and spherical equipotential surfaces, 
so it must make seven mistakes as (a) shows. Whilst TransA 
takes advantage of adaptive Mahalanobis metric and ellipti¬ 
cal equipotential surfaces in (b), four mistakes are avoided. 


performance. More specifically, geometry-based embedding 
methods represent an entity or a relation as fc-dimensional 
vector, then define a score function fr{h, t) to measure the 
plausibility of a triple {h, r, t). Such approaches almost fol¬ 
low the same geometric principle h 4- r « t and apply 
the same loss metric | |h + r — t| || but differ in the relation 
space where a head entity h connects to a tail entity t. 

However, the loss metric in translation-based models 
is oversimplified. This flaw makes the current embedding 
methods incompetent to model various and complex enti¬ 
ties/relations in knowledge base. 

Eirstly, due to the inflexibility of loss metric, cur¬ 
rent translation-based methods apply spherical equipoten¬ 
tial hyper-surfaces with different plausibilities, where more 
near to the centre, more plausible the triple is. As illustrated 
in Eig|2 spherical equipotential hyper-surfaces are applied 
in (a), so it is difficult to identify the matched tail entities 
from the unmatched ones. As a common sense in knowledge 
graph, complex relations, such as one-to-many, many-to-one 
and many-to-many relations, always lead to complex em¬ 
bedding topologies. Though complex embedding situation is 





















Figure 2; Specific illustration of weighting dimensions. The 
data are selected from Wordnet. The solid dots are cor¬ 
rect matches while the circles are not. The arrows indicate 
HasPart relation, (a) The incorrect circles are matched, due 
to the isotropic Euclidean distance, (b) By weighting embed¬ 
ding dimensions, we up-weighted y-axis component of loss 
and down-weighted x-axis component of loss, thus the em¬ 
beddings are refined because the correct ones have smaller 
loss in x-axis direction. 


an urgent challenge, spherical equipotential hyper-surfaces 
are not flexible enough to characterise the topologies, mak¬ 
ing current translation-based methods incompetent for this 
task. 

Secondly, because of the oversimplified loss metric, cur¬ 
rent translation-based methods treat each dimension iden¬ 
tically. This observation leads to a flaw illustrated in Fig|^ 
As each dimension is treated identically in (aQ the incorrect 
entities are matched, because they are closer than the correct 
ones, measured by isotropic Euclidean distance. Therefore, 
we have a good reason to conjecture that a relation could 
only be affected by several specific dimensions while the 
other unrelated dimensions would be noisy. Treating all the 
dimensions identically involves much noises and degrades 
the performance. 

Motivated by these two issues, in this paper, we propose 
TransA, an embedding method by utilizing an adaptive and 
flexible metric. First, TransA applies elliptical surfaces in¬ 
stead of spherical surfaces. By this mean, complex embed¬ 
ding topologies induced by complex relations could be rep¬ 
resented better. Then, as analysed in “Adaptive Metric Ap¬ 
proach”, TransA could be treated as weighting transformed 
feature dimensions. Thus, the noise from unrelated dimen¬ 
sions is suppressed. We demonstrate our ideas in Fig{^ (b) 
andFigjgCb). 

To summarize, TransA takes the adaptive metric ideas 
for better knowledge representation. Our method effectively 
models various and complex entities/relations in knowledge 
base, and outperforms all the state-of-the-art baselines with 
significant improvements in experiments. 

The rest of the paper is organized as follows: we sur¬ 
vey the related researches and then introduce our approach, 
along with the theoretical analysis. Next, the experiments 
are present and at the final part, we summarize our paper. 


Related Work 

We classify prior studies into two lines: one is the 
translation-based embedding methods and the other includes 
many other embedding methods. 


Translation-Based Embedding Methods 

All the translation-based methods share a common princi¬ 
ple h -I- r « t, but differ in defining the relation-related 
space where a head entity h connects to a tail entity t. This 
principle indicates that t should be the nearest neighbour of 
(h -f r). Hence, the translation-based methods all have the 
same form of score function that applies Euclidean distance 
to measure the loss, as follows: 

fr(h,t) = ||hr -hr - trili 


where hr,tr are the entity embedding vectors projected in 
the relation-specific space. Note that this branch of methods 
keeps the state-of-the-art performance. 


TransE ( Bordes et al. 201 3] l lays the entities in the origi¬ 
nal space, say hr = h, tr = t. 


• TransH ( |Wang et al. 2014[ ) projects the entities into a hy¬ 
perplane for addressing the issue of complex relation em¬ 
bedding, say hr = h — w^Thwr, tr = t — WjTtWr. 


TransR ( Lin et al. 2015| l transforms the entities by the 
same matrix to also address the issue of complex relation 
embedding, as: hr = Mrh, tr = Mrt. 


Projecting entities into different hyperplanes or trans¬ 
forming entities by different matrices allow entities to play 
different roles under different embedding situations. How¬ 
ever, as the “Introduction” argues, these methods are incom¬ 
petent to model complex knowledge graphs well and partic¬ 
ularly perform unsatisfactorily in various and complex enti¬ 
ties/relations situation, because of the oversimplified metric. 

TransM ( Fan et al. 2014| l pre-calculates the distinct 
weight for each training triple to perform better. 


Other Embedding Methods 

There are also many other models for knowledge graph em¬ 
bedding. 

Unstructured Model (UM). The UM ( |Bordes et al.| 
2012|l is a simplified version of TransE by setting all the re¬ 


lation vectors to zero r = 0. Obviously, relation is not con¬ 
sidered in this model. 


Structured Embedding (SE). The SE model ( [Bordes 
jet al. 20TT] ) applies two relation-related matrices, one for 
head and the other for tail. The score function is defined as 
fr(h,t) = I |Mh,rh — Mt.rtj ||. According to (Socher et al. 


|2013| l, this model cannot capture the relationship among en 
tides and relations. 

Single Layer Model (SLM). SLM applies neural net¬ 
work to knowledge graph embedding. The score function is 
defined as 


fr{h,t) = u7p(Mr,lh + Mr,2t) 


*The dash lines indicate the x-axis component of the loss (hx -\- 
Tx —tx) and the y-axis component of the loss {hy + Vy — ty). 


Note that SLM is a special case of NTN when the zero ten¬ 
sors are applied. (jCollobert and Weston 2008 i had proposed 






































a similar method but applied this approach into the language 
model. 

Semantic Matching Energy (SME). The SME model 
( Bordes et al. 201^ ( Hordes et al. 2014) 1 attempts to cap¬ 
ture the correlations between entities and relations by ma¬ 
trix product and Hadamard product. The score functions are 
defined as follows: 

fr = (ATih -f ]V[2r -(- bi)^(]V[3t -f ]V[4r -f b2) 
fr = (Mih 0 M2r -t- bi)^(M3t 0 M4r -f b2) 

where Mi, M 2 , M 3 and M 4 are weight matrices, 0 is the 
Hadamard product, bi and b 2 are bias vectors. In some re¬ 
cent work ( |Bordes et al. 201^ , the second form of score 
function is re-defined with 3-way tensors instead of matri¬ 
ces. 

Latent Eactor Model (LEM). The LEM Penatton et al.| 


replaces inflexible Euclidean distance with adaptive Maha- 
lanobis distance of absolute loss, because Mahalanobis dis¬ 


20121 uses the second-order correlations between entities by 
a quadratic form, defined as fr{h, t) = W^t. 

Neural Tensor Network (NTN). The NTN model 


( Socher et al. 20131 defines an expressive score function for 
graph embedding to joint the SLM and LEM. 

fr{h,t) = uj 5(h^W..rt -f Mr,lh -I- Mr,2t -f br) 

where Ur is a relation-specific linear layer, g{-) is the tank 
function, Wr G is a 3-way tensor. However, the 

high complexity of NTN may degrade its applicability to 
large-scale knowledge bases. 

RESCAL. is a collective matrix factorization model as 
common embedding method. ([Nickel, Tresp, and Kriegel| 


|201 1[ ) ( Nickel, Tresp, and Kriegel 2012| l. 


Sem antically Smooth Embedding (SSE). ( |Guo et ah] 

|2015 1 aims at leveraging the geometric structure of em¬ 
bedding space to make entity representations semantically 
smooth. 

(|Wa ng et al. 2014|) jointly e mbeds knowledge and texts. 
( jWang, W ang, and Guo 2015 |l invo lves the rules into em- 
bedding. ( |Lin, Liu, and Sun 2015| ) considers the paths of 
knowledge graph into embedding. 

Adaptive Metric Approach 


tance is more flexible and more adaptive (Wang and Sun 
|2014| l. Thus, our score function is as follows: 

fr{h,t) = (|h-|-r-t|)^Wr(|h-f r-t|) (2) 

where |h-f r-t| = (|/ii+ri-fi|, |/i 2 +^ 2 -f 2 |, \hn + 
fn — tn\) Wr is a relation-speciflc symmetric non¬ 
negative weight matrix that corresponds to the adaptive 
metric. Different from the traditional score functions, we 
take the absolute value, since we want to measure the ab¬ 
solute loss between (h -|- r) and t. Eurthermore, we would 
list two main reasons for the applied absolute operator. 

On one hand, the absolute operator makes the score 
function as a well-defined norm only under the condi¬ 
tion that all the entries of Wr are non-negative. A well- 
defined norm is necessary for most metric learning scenes 
(Kulis 2012 1 , and the non-negative condition could be 
achieved more easily than PSD, so it generalises the com¬ 
mon metric learning algebraic form for better render¬ 
ing the knowledge topologies. Expanding our score func¬ 
tion as an induced norm Nr{e) — fr{h, t) where 
e = h + r — t. Obviously, Nr is non-negative, identical and 
absolute homogeneous. Besides with the easy-to-verifled 
inequality Ar(ei-|-e 2 ) = y/lei -|- e 2 |^Wr|ei -|- 02 ] < 

VleiTWrleil -f v/|e 2 rWr|e 2 | = Nr (ei ) + Nr{e^), the 
triangle inequality is hold. Totally, absolute operators make 
the metric a norm with an easy-to-achieve condition, helping 
to generalise the representation ability. 

On the other hand, in geometry, negative or positive val¬ 
ues indicate the downward or upward direction, while in our 
approach, we do not consider this factor. Let’s see an in¬ 
stance as shown in Eig|^ Eor the entity Goniff, the x-axis 
component of its loss vector is negative, thus enlarging this 
component would make the overall loss smaller, while this 
case is supposed to make the overall loss larger. As a result, 
absolute operator is critical to our approach. Eor a numerical 
example without absolute operator, when the embedding di¬ 
mension is two, weight matrix is [0 1 ; 1 0 ] and the loss vector 
(h -b r — t) = (ei, 62 ), the overall loss would be 2 eie 2 . If 
ei > 0 and 62 < 0 , much absolute larger 62 would reduce 
the overall loss and this is not desired. 


In this section, we would introduce the adaptive metric ap¬ 
proach, TransA, and present the theoretical analysis from 
two perspectives. 

Adaptive Metric Score Function 

As mentioned in “Introduction”, all the translation-based 
methods obey the same principle h -b r « t, but they differ 
in the relation-speciflc spaces where entities are projected 
into. Thus, such methods share a similar score function. 

fr{h,t) = ||h + r-t ||2 

= (h-br-t)T(h-br-t) (1) 

This score function is actually Euclidean metric. The disad¬ 
vantages of the oversimplified metric have been discussed 
in “Introduction”. As a consequence, the proposed TransA 


Perspective from Equipotential Surfaces 

TransA shares almost the same geometric explanations with 
other translation-based methods, but they differ in the loss 
metric. Eor other translation-based methods, the equipoten¬ 
tial hyper-surfaces are spheres as the Euclidean distance de¬ 
fines: 

||(t-h)-r||2=C (3) 

where C means the threshold or the equipotential value. 
However, for TransA, the equipotential hyper-surfaces are 
elliptical surfaces as the Mahalanobis distance of absolute 
loss states ( |Kulis 20T^ : 

|(t-h)-r|^Wr|(t-h)-r| =C (4) 

Note that the elliptical hyper-surfaces would be distorted a 
bit as the absolute operator applied, but this makes no differ¬ 
ence for analysing the performance of TransA. As we know. 











































different equipotential hyper-surfaces correspond to differ¬ 
ent thresholds and different thresholds decide whether the 
triples are correct or not. Due to the practical situation that 
our knowledge base is large-scale and very complex, the 
topologies of embedding cannot be distributed as uniform 
as spheres, justified by Fig{T] Thus, replacing the spherical 
equipotential hyper-surfaces with the elliptical ones would 
enhance the embedding. 

As Figj^illustrated, TransA would perform better for one- 
to-many relations. The metric of TransA is symmetric, so 
it is reasonable that TransA would also perform better for 
many-to-one relations. Moreover, a many-to-many relation 
could be treated as both a many-to-one and a one-to-many 
relation. Generally, TransA would perform better for all the 
complex relations. 

Perspective from Feature Weighting 

TransA could be regarded as weighting transformed fea¬ 
tures. For weight matrix Wr that is symmetric, we ob¬ 
tain the equivalent unique form by LDL Decomposition 
( |Golub and Van Loan 2012} as follows: 

Wr=LTDrLr (5) 

/, = (Lr|h-fr-t|)TDr(Lr|h-fr-t|) (6) 


be defined as follows: 

min fr'ih',t')]+ + 

{h,r,t)GA {h' ,r' 

AfEllWrIll) +C'('^||e||2 + ^||r||2'j 

\reR ) \eeE rGR ) 

S.t. [Wr]., > 0 (7) 

where [ • ]+ = max{0, ■ ), A is the set of golden triples 
and A' is the set of incorrect ones, 7 is the margin that sepa¬ 
rates the positive and negative triples. || • ||f is the F-norm 
of matrix. C controls the scaling degree, and A controls the 
regularization of adaptive weight matrix. The E means the 
set of entities and the R means the set of relations. At each 
round of training process, Wr could be worked out directly 
by setting the derivation to zero. Then, in order to ensure the 
non-negative condition of Wr, we set all the negative entries 
of Wr to zero. 

Wr= - ^ (|h-fr-t||h + r-tr) (8) 

{h,r,t)GA 

+ Y. (|h' + r'-t'||h' + r'-t'r) 

{h' ,r',t')GA' 


In above equations, Lr can be viewed as a transformation 
matrix, which transforms the loss vector |h -f r — t| to an¬ 
other space. Furthermore, Dr = diag{wi,W 2 ,W 3 ....) is a 
diagonal matrix and different embedding dimensions are 
weighted by Wi. 

As analysed in “Introduction”, a relation could only be af¬ 
fected by several specific dimensions while the other dimen¬ 
sions would be noisy. Treating different dimensions identi¬ 
cally in current translation-based methods can hardly sup¬ 
press the noise, consequently working out an unsatisfactory 
performance. We believe that different dimensions play dif¬ 
ferent roles, particularly when entities are distributed di¬ 
vergently. Unlike existing methods, TransA can automat¬ 
ically learn the weights from the data. This may explain 
why TransA outperforms TransR although both TransA and 
TransR transform the entity space with matrices. 

Connection to Previous Works 

Regarding TransR that rotates and scales the embedding 
spaces, TransA holds two advantages against it. Firstly, we 
weight feature dimensions to avoid the noise. Secondly, we 
loosen the PSD condition for a flexible representation. Re¬ 
garding TransM that weights feature dimensions using pre¬ 
computed coefficients, TransA holds two advantages against 
it. Firstly, we learn the weights from the data, which makes 
the score function more adaptive. Secondly, we apply the 
feature transformation that makes the embedding more ef¬ 
fective. 

Training Algorithm 

To train the model, we use the margin-based ranking error. 
Taking other constraints into account, the target function can 


As to the complexity of our model, the weight matrix is 
completely calculated by the existing embedding vectors, 
which means TransA almost has the same free parameter 
number as TransE. As to the efficiency of our model, the 
weight matrix has a closed solution, which speeds up the 
training process to a large extent. 


Experiments 

We evaluate the proposed model on two benchmark tasks: 
link prediction and triples classification. Experiments are 
conducted on four public datasets that are the subsets of 
Wordnet and Ereebase. The statistics of these datasets are 
listed in Tabll] 

ATPE is short for “Averaged Triple number Per En¬ 
tity”. This quantity measures the diversity and complexity 
of datasets. Commonly, more triples lead to more complex 
structures of knowledge graph. To express the more com¬ 
plex structures, entities would be distributed variously and 
complexly. Overall, embedding methods produce less satis¬ 
factory results in the datasets with higher ATPE, because a 
large ATPE means a various and complex entities/relations 
embedding situation. 


Link Prediction 

Link prediction aims to predict a missing entity given the 
other entity and the relation. In this task, we predict t given 
(h, r, *), or predict h given (*, r, t). The WN18 and EB15K 
datasets are the benchmark datasets for this task. 

Evaluation Protocol. We follow the same protocol as 
used in TransE ( |Bordes et al. 2013 1 , TransH ( |Wang et'd? 


2014|l and TransR (Lin et al. 20151. Eor each testing triple 


we replace the tail t by every entity e in the knowl¬ 
edge graph and calculate a dissimilarity score with the score 
















Table 2: Evaluation results on link prediction 


Datasets 

WN18 

FB15K 


Metric 


Mean Rank 

HITS@10(%) 

Mean Rank 

HITS@10(%) 



Raw 

Filter 

Raw 

Filter 

Raw 

Filter 

Raw 

Filter 

SE(B 

ordes et al. 20111 


1,011 

985 

68.5 

80.5 

273 

162 

28.8 

39.8 

SM^ 

Bordes et al. 2012ll 

545 

533 

65.1 

74.1 

274 

154 

30.7 

40.8 

LFM(J 

enatton et al. 20121 

469 

456 

71.4 

81.6 

283 

164 

26.0 

33.1 

TransE 

(Bordes et al. 201 


263 

251 

75.4 

89.2 

243 

125 

34.9 

47.1 

TransI 

(Wang et al. 2014 


401 

388 

73.0 

82.3 

212 

87 

45.7 

64.4 

TransR ([Lin et al. 2015 


238 

225 

79.8 

92.0 

198 

77 

48.2 

68.7 

Adaptive Metric (PSD) 

289 

278 

77.6 

89.6 

172 

88 

52.4 

74.2 

TransA 

405 

392 

82.3 

94.3 

155 

74 

56.1 

80.4 


Table 1: Statistics of datasets 


Data 

WN18 

FB15K 

WNll 

FB13 

#Rel 

18 

1,345 

11 

13 

#Ent 

40,943 

14,951 

38,696 

75,043 

#Train 

141,442 

483,142 

112,581 

316,232 

#Valid 

5,000 

50,000 

2,609 

5,908 

#Test 

5,000 

59,071 

10,544 

23,733 

ATPE^ 

3.70 

39.61 

3.25 

4.61 


function fr{h,e) for the corrupted triple {h,r,e). Ranking 
these scores in ascending order, we then get the rank of the 
original correct triple. There are two metrics for evaluation: 
the averaged rank (Mean Rank) and the proportion of test¬ 
ing triples, whose ranks are not larger than 10 (HITS @10). 
This is called “Raw” setting. When we hlter out the cor¬ 
rupted triples that exist in all the training, validation and test 
datasets, this is the“Filter” setting. If a corrupted triple exists 
in the knowledge graph, ranking it before the original triple 
is acceptable. To eliminate this issue, the “Filter” setting is 
more preferred. In both settings, a lower Mean Rank or a 
higher HITS @ 10 is better. 

Implementation. As the datasets are the same, we di¬ 
rectly copy the experimental results of several baselines 


on the validation dataset to get the best conhguration for 
both Adaptive Metric (PSD) and TransA. Under the “bern.” 
sampling strategy, the optimal conhgurations are: learning 
rate a = 0.001, embedding dimension A: = 50, 7 = 2.0, 
C = 0.2 on WN18; a = 0.002, k = 200, 7 = 3.2, and 
C = 0.2onFB15K. 

Results. Evaluation results on WN18 and FB15K are re¬ 
ported in Tab|^ and Tab|^ respectively. We can conclude 
that: 

1. TransA outperforms all the baselines signihcantly and 
consistently. This result justihes the effectiveness of 
TransA. 

^ ATPE:Averaged Triple number Per Entity. Triples are summed 
up from all the #Train, #Valid and #Test. 


from the literature, as in (|Bordes et al. 201311, (Wang et al. 


20141 and (Fin et al. 2015|l. We have tried several settings 


2. FB15K is a very various and complex entities/relations 
embedding situation, because its ATPE is absolutely high¬ 
est among all the datasets. However, TransA performs 
better than other baselines on this dataset, indicating 
that TransA performs better in various and complex en¬ 
tities/relations embedding situation. WN18 may be less 
complex than FB15K because of a smaller ATPE. Com¬ 
pared to TransE, the relative improvement of TransA on 
WN18 is 5.7% while that on FB15K is 95.2%. This com¬ 
parison shows TransA has more advantages in the various 
and complex embedding environment. 

3. TransA promotes the performance for 1-1 relations, which 
means TransA generally promotes the performance on 
simple relations. TransA also promotes the performance 
for 1-N, N-1, N-N relation^ which demonstrates TransA 
works better for complex relation embedding. 

4. Compared to TransR, better performance of TransA 
means the feature weighting and the generalised metric 
form leaded by absolute operators, have signihcant bene- 
hts, as analysed. 

5. Compared to Adaptive Metric (PSD) which applies the 
score function fr{h, t) = (h -f r — t)^Wr(h -b r — t) 
and constrains Wr as PSD, TransA is more competent, 
because our score function with non-negative matrix con¬ 
dition and absolute operator produces a more flexible rep¬ 
resentation than that with PSD matrix condition does, as 
analysed in “Adaptive Metric Approach”. 

6 . TransA performs bad in Mean Rank on WN18 dataset. 
Digging into the detailed situation, we discover there are 
27 testing triples (0.54% of the testing set) whose ranks 
are more than 30,000, and these few cases would make 
about 162 mean rank loss. The tail or head entity of all 
these triples have never been co-occurring with the corre¬ 
sponding relation in the training set. It is the insufficient 
training data that leads to the over-distorted weight matrix 
and the over-distorted weight matrix is responsible for the 
bad Mean Rank. 


^Mapping properties of relations follow the same rules in 1 Bor- 
des et al. 2013|l. 

































































Table 3: Evaluation results on FB15K by mapping properties of relations(%) 


Tasks 

Predicting Head(HITS@10) 

Predicting Tail(HITS@10) 

Relation Category 

1-1 

1-N 

N-1 

N-N 

1-1 

1-N 

N-1 

N-N 

SE( 

Bordes et al. 2011 1 


35.6 

62.6 

17.2 

37.5 

34.9 

14.6 

68.3 

41.3 

SME 

IE 

>ordes et al. 2012 


35.1 

53.7 

19.0 

40.3 

32.7 

14.9 

61.6 

43.3 

TransE ( 

Bordes et al. 20 L 

T 

43.7 

65.7 

18.2 

47.2 

43.7 

19.7 

66.7 

50.0 

TransH 

Wang et al. 2014 

1 

66.8 

87.6 

28.7 

64.5 

65.5 

39.8 

83.3 

67.2 

TransR 

(Lin et al. 2015 1 


78.8 

89.2 

34.1 

69.2 

79.2 

37.4 

90.4 

72.1 

TransA 

86.8 

95.4 

42.7 

77.8 

86.7 

54.3 

94.4 

80.6 


Table 4: Triples classification: accuracies(%) for different 
embedding methods 


Methods 

WNll 

FB13 

Avg. 

LEM 

73.8 

84.3 

79.0 

NTN 

70.4 

87.1 

78.8 

TransE 

75.9 

81.5 

78.7 

TransH 

78.8 

83.3 

81.1 

TransR 

85.9 

82.5 

84.2 

Adaptive Metric (PSD) 

81.4 

87.1 

84.3 

TransA 

83.2 

87.3 

85.3 


Triples Classification 

Triples classification is a classical task in knowledge base 
embedding, which aims at predicting whether a given triple 
{h, r, t) is correct or not. Our evaluation protocol is the same 
as prior studies. Besides, WNll and FBI3 are the bench¬ 
mark datasets for this task. Evaluation of classification needs 
negative labels. The datasets have already been built with 
negative triples, where each correct triple is corrupted to get 
one negative triple. 

Evaluation Protocol. The decision rule is as follows: for 
a triple (h, r, t), if fr {h, t) is below a threshold ar, then posi¬ 
tive; otherwise negative. The thresholds {< 7 ,.} are determined 
on the validation dataset. The final accuracy is based on how 
many triples are classified correctly. 

Implementation. As all methods use the same datasets, 
we directly copy the results of different methods from the 
literature. We have tried several settings on the validation 
dataset to get the best configuration for both Adaptive Metric 
(PSD) and TransA. The optimal configurations are: “bern” 
sampling, a = 0.02, /fc = 50, 7 = 10.0, C = 0.2 on WNll, 
and “bem” sampling, a = 0.002, k = 200, 7 = 3.0, C = 
0.00002 onFB13. 

Results. Accuracies are reported in Tab|^ and Figj^ Ac¬ 
cording to “Adaptive Metric Approach” section, we could 
work out the weights by LDL Decomposition for each 
relation. Because the minimal weight is too small to make 
a significant analysis, we choose the median one to rep¬ 
resent relative small weight. Thus, “Weight Difference” is 

calculatadby ( 

weight difference is, more significant effect, the feature 
weighting makes. Notably, scaling by the median weight 



Figure 3: Triples classification accuracies for each relation 
on WNll(left) and FB13(right). The “weight difference” is 
worked out by the scaled difference between maximal and 
median weight. 


makes the weight differences comparable to each other. We 
observe that: 

1. Overall, TransA yields the best average accuracy, illus¬ 
trating the effectiveness of TransA. 

2. Accuracies vary with the weight difference, meaning the 
feature weighting benefits the accuracies. This proves the 
theoretical analysis and the effectiveness of TransA. 

3. Compared to Adaptive Metric (PSD) , TransA performs 
better, because our score function with non-negative ma¬ 
trix condition and absolute operator leads to a more flex¬ 
ible representation than that with PSD matrix condition 
does. 

Conclusion 

In this paper, we propose TransA, a translation-based knowl¬ 
edge graph embedding method with an adaptive and flex¬ 
ible metric. TransA applies elliptical equipotential hyper¬ 
surfaces to characterise the embedding topologies and 
weights several specific feature dimensions for a relation to 
avoid much noise. Thus, our adaptive metric approach could 
effectively model various and complex entities/relations 
in knowledge base. Experiments are conducted with two 
benchmark tasks and the results show TransA achieves con¬ 
sistent and significant improvements over the current state- 
of-the-art baselines. To reproduce our results, our codes and 
data will be published in github. 
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